Deep Learning Course - Assignment 2

Q1-2. PAMAP2 Physical Activity Monitoring dataset

Submitted by: Itay Bouganim and Ido Rom

Problem Statement

The PAMAP2 Physical Activity Monitoring dataset contains data of 18 different physical activities (such as walking, cycling, playing soccer, etc.), performed by 9 subjects wearing 3 inertial measurement units and a heart rate monitor. The dataset can be used for activity recognition and intensity estimation, while developing and applying algorithms of data processing, segmentation, feature extraction and classification.

Task Description

The PAMAP2 Physical Activity Monitoring dataset contains data of 18 different physical activities, performed by 9 subjects wearing 3 inertial measurement units and a heart rate monitor. Our goal is to classify timeseries window measurments from induviduall activities to type of activity the measurments were taken from.

PAMAP2 dataset link

Check for existing physical GPU

1.a. Exploratory Analysis

Data collection protocol

Each of the subjects had to follow a protocol, containing 12 different activities. The folder 'Protocol'. contains these recordings by subject.
Furthermore, some of the subjects also performed a few optional activities. The folder 'Optional'. contains these recordings by subject.

Data files

Raw sensory data can be found in space-separated text-files (.dat), 1 data file per subject per session (protocol or optional). Missing values are indicated with NaN.
One line in the data files correspond to one timestamped and labeled instance of sensory data.
The data files contain 54 columns: each line consists of a timestamp, an activity label (the ground truth) and 52 attributes of raw sensory data.

3D data is (x, y, z) like (pitch, roll and yaw)

Generating data columns & Preparing the data

IMU Sensory data samples from Subject 101

Few notices on the Protocol and Optional data

Check what coulmns has missing data values and the amount of missing values (NaN values)

Observe missing sensory data due to sensor frequency difference

IMU sensors frequency is 100Hz meaning 100 samples per second.
HR sensor frequency is 9Hz meaning 9 samples per second.

We expect that for every 9 HR samples we will have 100 IMU samples so the ratio will be approx. 100/9 = ~11.11

Closer observation at the heart rate data points of Sample Subject 101 (first 4200 timestamps from activity 1)

Initial data observations:

Fixing data

We can see that the data contains transient activities (activity_id = 0) and that some measurments contain NaN values.
As mentioned earlier, transient activities should be discarded.
We will solve the NaN values by using interpolation to estimate the missing value according to the previous and next closest values.

Closer observation at the heart rate data points of Sample Subject 101 after interpolation(first 4200 timestamps from activity 1)

1.b. Data insights and our task

The task at hand

After reviewing out data we can define out task.
The task at hand is an hierarchical classification task.
Given a measurment of the HR-monitor and the IMU sensors we want to predict what is the activity performed by the subject at that particular time frame.
We will classify segements of the time series to one of the valid activities and try to determine the activity the meaurments were taken from.

Ankle IMU acceleration measurments across activities

We will take a further look how our data changes across different types of activities

In order to demonstrate we will choose the columns that are related to hand IMU sensors

Lying activity hand IMU sensors

Sitting activity hand IMU sensors

Running activity hand IMU sensors

Cycle activity hand IMU sensors

Acsending Stairs activity hand IMU sensors

Vaccum Cleaning activity hand IMU sensors

1.c. Self-Supervised tasks suggestions

We can perform multiple self-suprvised task on the data to gain further insights.

Self-supervised task suggestions:

Save loaded subject data to CSV to make it easier to read later

Data Balance Analysis

Check if the data is balanced using Shannon Entropy

Although we can see from the chart above that we have variation between the representation precentage of the different classes in the dataset, we will use Shannon Entropy to calculate the imbalance.
In information theory, Shannon Entropy is an estimation of the average amount of information stored in a random variable.
For example, an unbiased coin flip contains 1 bit of information with each result. The entropy of a random variable can be estimated from a series of results.

The entropy value meaning is:

Therefore, dividing the entropy by logk, where k is the number of classes will give us a measurment for balance where:

We can see that the data is generally balanced, no further action needed

Calculate mean sample count from each activity for test subjects and train/valid subjects

+ Show missing activities
Activity window segments in train and validation come from the same subject set ({101, 102, 103, 104, 105, 106, 109})

We can also observe that the train samples taken from the train subject does not have any representation for the dollowing activities:

We can see that the testing data we use is less balanced than the training data

Classification Problem

The problem we wish to address is the classification of consecutive time windows to the activity that was performed in this time segment.
We will use the Sliding window technique with a window look back value of 200.
Since data was measured and interpolated to fit the measurment rate of 100Hz, meaining measurment every 0.01 second,
all window segments will contain 2 seconds of data measurments.
For each of the 2 seconds HR-monitor and IMU sensors measurments we wish to determine what was the activity that produced those measurments (undependent of the performing subject).

hirerchal classification - activities - windows where there is one action - we want to predict the current activity 100hz we want 150-200 timepoint window - maybe interpolate + describe dataframe

2.a. Preprocessing steps and Validation Strategy

Preprocessing steps

Validation Strategy

As a validation stategy we will use stratisfied train-test split in order to preserve the samples representation and include all classes in the training and validation data.
So our validation data will be a statisfied subset of the training windows data taken from the same range of activities.
Additionaly, we will shuffle our training, validation and testing data after divided to windows (since the internal order of the activity timeframe windows matter but not the order between different windows.

Fit a scaler to our data in order to normalize values to range [0, 1]

2.b. Naïve baseline solution

As a naive baseline solution we will use stratisfied Dummy Classifier.
This startisfied dummy classifier will produce dummy perdictions so that the predicted class will be the according to the relative probability from the entire train data and the class distribution for each category.
For each row (individual meaurment datapoint) in the data the classifier will classify it to class x int the probability of x to be chosen randomly from the data.

2.c. Classical ML algorithms solid benchmark

We will fit our data (rows from measurments from each activity) into two ML algorithms in order to get a solid benchmark for the future LSTM model performance.
Each algorithm will get individual datapoints as X values (rows from activities) and the expected outcome will be the activity the measurment was taken from as y value.
We will use: Decision Tree and Random Forest as an extension

Decision Tree Classification Baseline

A Decision Tree is a flowchart-like tree structure where an internal node represents feature(or attribute), the branch represents a decision rule, and each leaf node represents the outcome.
The topmost node in a decision tree is known as the root node.
It learns to partition on the basis of the attribute value.
It partitions the tree in recursively manner call recursive partitioning. This flowchart-like structure helps you in decision making.

Random Forest Classification Baseline

We will try to achieve better baseline results by using multiple decision trees in the form of Random Forest classification ML algorithm.

Random forests or random decision forests are an ensemble learning method for classification, regression and other tasks that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean/average prediction (regression) of the individual trees.
Random decision forests correct for decision trees' habit of overfitting to their training set.
Random forests generally outperform decision trees, but their accuracy is lower than gradient boosted trees. However, data characteristics can affect their performance.

2.d. Classification using LSTM NN

In this section we will use Neural Networks in order to solve our classifictaion problem.
We will use LSTM layer to account for our measurments data window sequences.

About RNNs

RNN's map an input sequence to an output sequence. RNNs can be used in interesting sequencing tasks. Ranging from machine language translation to time series forecasting.

With a simple RNN, there is the problem of "vanishing gradients", meaning the farther back the loss is propagated,
the rate approaches zero meaning that information about the sequence properties are not saved. This is solved using Long Short Term Memory Units (LSTMs).

About LSTM

Long short-term memory (LSTM) is an artificial recurrent neural network (RNN) architecture.
Unlike standard feedforward neural networks, LSTM has feedback connections.
It can not only process single data points (such as images), but also entire sequences of data (such as speech or video).
For example, LSTM is applicable to tasks such as unsegmented, connected handwriting recognition, speech recognition and anomaly detection in network traffic or IDSs (intrusion detection systems).

In this task we will use LSTM in order ro process our measurments windows sequence of datapoints.

The following image explains how we will solve the task at hand using LSTMs

We can see that the LSTM model performed relativly well , yet did not pass the benchmark score we got by using the Random Forest ML classifier

2.e. Pretrain model for forecasting task and fine-tune for classification

We are going to pretrain our model to the previously suggested task of forecasting the next time step heart rate according to the previous 200 measurments samples.
After the model is trained for the heart rate forecasting task we will fine-tune it to conform to the classification task we are after and see if we can get better results.

Preprocess the data for forecasting task

Create timeframe windows and y true HR values

True Forecast values (top chart) vs. Predicted Forecast values (bottom chart)

Fine-tune the forecast model for our prediction task

2.f. Improvement suggestions and first LSTM iteration summary

Why is the model fitting well on training and validation data but less for test data?

In our opinion the 4 main reasons for the good accuracy and loss ratings of the model on the validation data versus the accuracy and loss rating of the testing data are:
  1. The validation data and training data both contain timeframes from the same exact activities and were both taken from the same subset of subjects (101, 102, 103, 104, 105, 106, 109)
  2. The validation data and training data both contain samples from each class and since we use stratisfied train-test split to produce the validation data, the activity representation percentage in the training data and validation data is simillar.
    The testing data was taken from different subjects.
  3. The testing data does not contain any samples for two of the classes the model was trained on (No samples for watching TV and Car driving activities).
    The training/validation data contain samples from all classes available to us.
  4. The testing data is from a whole diffrent subject subset (107, 108) and therefore does not contain timeframes that intersect with the timeframes the model was trained on.

Improvements suggestions going forward

2.e. Improving the model

We will implement the following improvements:

  1. Increasing node count for LSTM layer to increase look back memory
  2. Add 1D Time Distributed Convolutional layer before LSTM layers to increase the dominant and average features capturing for before passing to to the LSTM layers
For both of those improvements we will additionally add more dense layer to try and capture more complex features.

Increase LSTM nodes and add dense layers

We can see that we got an improvement and surpassed our Random Forest benchmark, yet we can further improve by using a more complex model involving CNN

Add Time Distributed 1D Convolutional layers

Added 2 1D Convolutional layers with concatenation of the max pooling and average pooling to be the input for the LSTM layer

Time Distibuted Convolutional Usage Concolusions

We can see that we achieved a big improvement by adding time distibuted 1D convolutional layers to our model (~52% - ~70.4%).
We can see that the fact that 2 of the activity types are not represented in out test set hurts affects out testing.
One of the big observations is that the training and validation set (that was taken from the same subset of subjects) almost firs perfectly (higher than 99%).
However our test data achieves only 70.4% approx and still have a big loss value.
The reason for it is probably the fact that we chose as our data all the protocaol and optional data, that caused inconssitency in the representation of the different activities for different subjects (as not all subjects performed the optional activities).
If we would train the model only for the protocol activities we would not encounter that problem and probably could achieve higher accuracy and lower loss metrics for out test data.

We will try to train the same model again to achieve better results

Second train conclusions

We can see that the model increased very slightly when it comes to train/validation metrics.</br> However we had a big drop in the test data metrics. We can conclude that the model started overfitting at this point and more training will lower our test metrics.
(We can't really notice the overfitting for the validation data since it was sampled from the same activity time frames as the training data and therfore our model accepts it well as opposed to the training data.

Model Performance Summary